Fuzzy lexical matching

نویسندگان

  • Marc Schoolderman
  • Kees Koster
  • Marc Seutter
چکیده

Being able to automatically correct spelling errors is useful in cases where the set of documents is too vast to involve human interaction. In this bachelor's thesis, we investigate an implementation that attempts to perform such corrections using a lexicon and edit distance measure. We compare the familiar Levenshtein and Damerau-Levenshtein distances to modi cations where each edit operation is assigned an individual weight. We nd that the primary bene t of using this form of edit distance over the original is not a higher rate of correction, but a lower susceptibility to false friends. However, deriving the correct weights for each edit operation turns out to be a harder problem than anticipated. While a weighted edit distance can theoretically be implemented e ectively, a deeper analysis of the costs of edit operations is necessary to make such an approach practical.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantics-based pretranslation for SMT using fuzzy matches

Semantic knowledge has been adopted recently for SMT preprocessing, decoding and evaluation, in order to be able to compare sentences based on their meaning rather than on mere lexical and syntactic similarity. Little attention has been paid to semantic knowledge in the context of integrating fuzzy matches from a translation memory with SMT. We present work in progress which focuses on semantic...

متن کامل

Fuzzy Automata: A Quantitative Review

Classical automata theory cannot deal with the system uncertainty. To deal with the system uncertainty the concept of fuzzy finite automata was proposed. Fuzzy automata can be used in diverse applications such as fault detection, pattern matching, measuring the fuzziness between strings, description of natural languages, neural network, lexical analysis, image processing, scheduling problem and...

متن کامل

Filtering Spam by Using Factors Hyperbolic Trees

Most of current Anti-spam techniques, like the Bayesian anti-spam algorithm, primarily use lexical matching for filtering unsolicited bulk E-mails (UBE) and unsolicited commercial E-mails (UCE). However, precision of spam filtering is usually low when the lexical matching algorithms are used in real dynamic environments. For example, an E-mail of refrigerator advertisements is useful for most f...

متن کامل

Fuzzy symbolic sensors - From concept to applications

This paper deals with sensors which compute and report linguistic assessments of numerical acquired values. Such sensors, called symbolic sensors, are particularly adapted when working with control systems which use artificial intelligence techniques. After having reconsidered some elements of the measurement theory, this paper sets the foundations of the symbolic sensors by introducing the mea...

متن کامل

Simple Finite - Fuzzy - Automaton Model 1

Many fuzzy automaton models have been introduced in the past. Here, we discuss two basic nite fuzzy automaton models, the Mealy and Moore types, for lexical analysis. We show that there is a remarkable di erence between the two types. We consider that the latter is a suitable model for implementing lexical analysers. Various properties of fuzzy regular languages are reviewed and studied. A fuzz...

متن کامل

Lexical Reference: a Semantic Matching Subtask

Semantic lexical matching is a prominent subtask within text understanding applications. Yet, it is rarely evaluated in a direct manner. This paper proposes a definition for lexical reference which captures the common goals of lexical matching. Based on this definition we created and analyzed a test dataset that was utilized to directly evaluate, compare and improve lexical matching models. We ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012